CS T680Software Analytics

Course Syllabus


  1. General Information


    Instructor: Dr. Preetha Chatterjee

    Instructor Office Hours: 1160 in 3675MK or via Zoom, by appointment

    Term: Spring 2023

    Credits: 3-hour lecture [3 credits]


  2. Student Learning Information

    1. Course Description:

      Software repositories archive valuable software engineering data, such as source code, execution traces, historical code changes, mailing lists, bug reports, and chats. This data contains a wealth of information about a project’s status and history. By doing data science on software repositories, researchers can gain an understanding of software development practices, and practitioners can better manage, maintain, and evolve complex software projects.

      In recent years, the advances in Machine Learning (ML) and Natural Language Processing (NLP) technologies did not go unnoticed in the field of Software Engineering. Researchers have applied software analytics techniques to various tasks such as code summarization, code comment generation, question-answer extraction, sentiment analysis, etc.

      CS T680 aims to give students a deep understanding of and a hands-on approach to how ML and NLP techniques are used to represent knowledge and solve existing SE problems in novel ways.


    2. Prerequisites:

      • Students should be able to code in Python.

      • Students should be familiar with using GitHub.

      • Familiarity with basic machine learning, and natural language processing techniques is preferred, but not required.


        2.3. Statement of Expected Learning

        Course Objective: The overall goal of this course is to allow you to participate collaboratively in a research project that is scoped to make significant progress through the research process within the allotted time of this term. The goal is to explore and problem-solve creatively with one or more software engineering-related data sets that would be made available to you.

        Learning Outcomes: This course will enable students to

        • Analyze related work in the area of Software Analytics.

        • Apply techniques from machine learning, and natural language processing on software engineering-related datasets.

        • Analyze data from software repositories and extract new insights (i.e., mining software repositories).

        • Evaluate the applicability of results in the software analytics literature on practical problems.


        2.4. Course Purpose

        This course counts towards the depth requirements of a CS Ph.D. degree and will count towards CS, AI, and SE electives. Similarly, this course counts towards the requirements of an M.S. degree and will count as an elective for the MS-CS, MS-SE, or MS-AIML degrees. This course will be particularly useful for students with research interests in software engineering, machine learning, and natural language processing.


        2.5 Tools/Programming Language

        This course uses Python3. You are allowed to use any IDE of your choice.


  3. Course Materials

    Required Readings


    References (for programming and data science) This is optional

  4. Assignments and Final Project

    The course grading is focused on assignments (e.g., responses to readings) and a final project.


    Assignment submission requirements:

    Each week (with a few exceptions) you will need to respond to one or more research papers, no later than 11.59 PM the day before class. The idea is to read the paper, write a summary (more instructions on this later), and most importantly come up with 2-3 interesting questions to be discussed in the class.


  5. Grading

    Grading Matrix:


    There will be no exams in this course.


    The instructor reserves the right to make modest adjustments (5% or 10% for a category) in the weighting.

    The following scale will be used to convert points to letter grades:



    Points

    Grade

    97-100

    A+

    92-96.99

    A

    90-91.99

    A-

    87-89.99

    B+

    82-86.99

    B

    80-81.99

    B-

    77-79.99

    C+

    72-76.99

    C

    70-71.99

    C-

    67-69.99

    D+

    60-66.99

    D

    0-59.99

    F

    Note that the instructor may revise this conversion if/when necessary.


    Attendance

    In-person section: Drexel’s stated policy is that course attendance is mandatory for students in the

    in-person section. I will not take attendance explicitly in every class, because it’s tedious and takes away time from actual material. But I do expect you to come, and if you are absent on a regular basis it will negatively affect your term grade, beyond the grading percentages above. That said, I understand things happen (you might get sick, your car might break down). If you’re in the in-person section and will miss class, please let me know.

    Online section: You are welcome to email me with questions related to the materials. I hope to interact with each one of you via Zoom, so please feel free to reach out for fixing appointments for office hours.


  6. Course Schedule

    [This schedule is tentative and may change during the course.] Week by week:

    1. Introductions, Syllabus, Overview of Software Analytics, How to read a research paper


    2. Paper discussion and in-class activities

      Topic: Understanding Datasets in Software Engineering

      (projects could be based on you selecting to work on one or more of these datasets)

      1. Paper 1: GitterCom -A Dataset of Open Source Developer Communications in Gitter

      2. Paper 2: Apache Software Foundation Incubator Project Sustainability Dataset

      3. Paper 3: SOSum: A Dataset of Stack Overflow Post Summaries

    3. Initial project description and research plan (Student presentation)

      In-person section: Students need to present in class

      Online section: Students need to record their presentations and upload them on BBLearn.


    4. Paper discussion and in-class activities

      Topic: Qualitative and Empirical Studies

      1. Paper of the week: Do I Belong? Modeling Sense of Virtual Community Among Linux Kernel Contributors (ICSE ‘23 - Won Distinguished Paper Award)

      2. Optional reading: Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineering Tools (MSR ‘19)


    5. Paper discussion and in-class activities

      Topic: Emotions and Sentiment Analysis

      1. Paper of the week: Data Augmentation for Improving Emotion Recognition in Software Engineering Communication (ASE ‘22)

      2. Optional reading: ``Did You Miss My Comment or What?'' Understanding Toxicity in Open Source Discussions (ICSE ‘22 - Won Distinguished Paper Award)


    6. Project Update and Q&A (Student presentation) Project Update Template Slides - Google Slides

      In-person section: Students need to present in class

      Online section: Students need to record their presentations and upload them on BBLearn.


    7. Paper discussion and in-class activities

      Topic: Summarization/Classification

      1. Paper of the week: Automated Summarization of Stack Overflow Posts (ICSE ‘23)

      2. Optional reading: Automatic Extraction of Opinion-based Q&A from Online Developer Chats (ICSE ‘21)


    8. Paper discussion and in-class activities

      Topic: Open Source Software Sustainability

      1. Paper of the week: On the Self-Governance and Episodic Changes in Apache Incubator Projects: An Empirical Study (ICSE ‘23)

      2. Optional reading: Sustainability Forecasting for Apache Incubator Projects (FSE ‘21)


    9. Paper discussion and in-class activities

      Topic: Developer Productivity

      1. Paper of the week: Towards a Theory of Software Developer Job Satisfaction and Perceived Productivity (TSE ‘19)

      2. Optional reading: The SPACE of Developer Productivity (acm.org) (ACM QUEUE ‘21)

    10. Final Project Presentation (Student presentation)

    In-person section: Students need to present in class

    Online section: Students need to record their presentations and upload them on BBLearn.


  7. Academic Policies

Be careful about using public code available on the internet or ChatGPT. If you are keen on using some resources that you think would be useful, you need to get the instructor's permission prior to using them. You would also need to cite the resource you have used in your project.


This course follows university, college, and department policies, including but not limited to


The instructor(s) may, at his/her/their discretion, change any part of the course before or during the term, including assignments, grade breakdowns, due dates, and schedule. Such changes will be communicated to students via the course website. This website should be checked regularly and frequently for such changes and announcements.


Students requesting accommodations due to a disability at Drexel University need to request a current Accommodations Verification Letter (AVL) in the ClockWork database before accommodations can be made. These requests are received by Disability Resources (DR), who then issues the AVL to the appropriate contacts. For additional information, visit the DR website at drexel.edu/oed/disabilityResources/overview/, or contact DR for more information by phone at 215.895.1401, or by email at disability@drexel.edu.